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The study describes levels of thinking in regard to the design of statistical studies. 
Clinical interviews were conducted with 15 students who were enrolled in high 
school or were recent high school graduates, and who represented a range of 
mathematical backgrounds. During the clinical interview sessions students were 
asked how they would go about designing studies to answer several different 
quantifiable questions. Several levels of sophistication were identified in their 
responses, and are discussed in terms of the Biggs and Collis (1982, 1991) cognitive 
model. 


Study design is foundational to the practice of statistics. Cobb and Moore 
(1997) underscored this point with the following statement: 

Statistical ideas for producing data to answer specific questions are the most 
influential contributions of statistics to human knowledge. Badly designed data 
production is the most common serious flaw in statistical studies. Well designed 
data production allows us to apply standard methods of analysis and reach clear 
conclusions, (p. 807) 

Wild and Pfannkuch (1999) also emphasized the importance of study design, 
stating that it is an indispensable part of the overall process of statistical thinking. 
Recognizing the importance of study design, the National Council of Teachers of 
Mathematics (NCTM, 2000) recommended that students should begin to have 
experiences in designing simple studies during their preschool years and develop 
increasingly more sophisticated study design strategies throughout their years of 
formal schooling. Similar recommendations appear in other curriculum documents 
(e.g., Australian Education Council, 1991). 

Purpose of the Study 

Since the topic of study design forms an important part of statistics education, 
the need exists to understand students' patterns of thinking in response to 
statistical study design tasks. Research that describes students' cognition in regard 
to mathematical topics has the potential to help improve the teaching of the topics 
(Even & Tirosh, 2002; Fennema & Franke, 1992). The purpose of the present study 
was to contribute to the knowledge base concerning students' understanding of 
study design by focusing on high school students (high school in the United States, 
where the present study was conducted, generally includes students 14-18 years 
old). The following two research questions were addressed. 

1. What are the defining characteristics of high school students' patterns of 
response to statistical study design tasks? 

2. What cognitive level can be associated with each of the patterns of 
response identified? 
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Previous Research on Students' Knowledge of Study Design 

This section presents research that provides some insight about students' 
abilities to design statistical studies. In carrying out the present study, special 
attention was paid to whether or not issues encountered in the literature arose 
among the students studied. Since the focus of the present study is upon the high 
school level, the research literature discussed includes descriptions of the thinking 
of students at or near the high school level. 

Watson and Moritz (2000a) investigated Grade 3-11 Australian students' 
abilities to detect bias when considering statistical samples. They found that 
students ranged in sophistication from those who offered no criticism of situations 
in which bias would naturally occur to those who recognised the need for samples 
to be representative and unbiased. The ability to detect bias seemed to be related to 
grade level. When Watson and Moritz studied some of the same students two to 
four years later, they found that students tended to improve by one or two levels of 
sophistication in thinking. The results of their study showed that students do not 
always recognise that unrepresentative and biased samples produce undesirable 
results, but that their ability to detect bias seems to improve as they progress 
through school. 

Data supporting the finding that the ability to detect bias is related to grade 
level occur in United States students' responses to an item on the 1996 National 
Assessment of Educational Progress (NAEP). Whereas approximately half of 
eighth grade students responded correctly to an item designed to assess their 
ability to recognise the potential for sample bias, approximately 75% of Grade 12 
students responded correctly to the same item (Zawojewski & Shaugnessy, 2000). 
This finding suggests that the ability to detect bias in study design is present more 
frequently among older students. 

In order to design an effective statistical study it is not sufficient simply to 
recognise that samples need to be representative and unbiased. One must also use 
methods with the potential to produce such samples. Watson and Moritz (2000b) 
conducted a study in which they interviewed Australian students in Grades 3, 6, 
and 9 about their ideas pertaining to sampling, finding that some of the students 
interviewed understood the roles of randomization and sample size in producing a 
representative sample. Zawojewski and Shaughnessy (2000) noted that about two- 
thirds of eighth-grade U.S. students taking the 1996 NAEP could correctly choose 
the sampling method that would provide the least biased results when given 
several choices of sampling methods in a multiple choice question. Given these 
findings, it seems reasonable to expect students to develop the ability to choose 
appropriate methods for sampling during their high school years. 

Another essential part of effective statistical study design is deciding when 
and how to conduct experimental studies rather than non-experimental ones. This 
can be challenging even for college students. Heaton and Mickelson (2002) found 
that undergraduates had some difficulty matching appropriate data collection 
methods to the quantifiable questions they had posed for class projects. Derry, 
Levin, Osana, Jones, and Peterson (2000) described the development of 
undergraduates' statistical thinking ability in regard to study design, finding that 
students showed significant gains in knowledge of the design of convincing 
experiments and the concept of random sampling during the course. Despite the 
overall gains, however, many students still tended to confuse the concepts of 
random sampling and random assignment after the course. Given the difficulties 
college students have exhibited with deciding when and how to conduct 
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experiments, one would expect experimental design to be a non-trivial matter for 
high school students. 

The research literature described in this section highlights some of the relevant 
aspects that high school students need to attend to as they design statistical 
studies. For any given quantifiable question of interest, students must determine 
whether an experimental or a non-experimental method is appropriate. Once that 
determination has been made, formal methods can be used in order to implement 
the design of the study. If non-experimental survey methods are appropriate, 
students need to produce representative samples from the population of interest 
and eliminate any possible bias. Formal tools such as drawing random samples can 
be incorporated. If experimental methods are appropriate, experimental designs 
can be enhanced by the use of formal principles such as random assignments. 

Theoretical Perspective 

For the present study, the Structure of the Observed Learning Outcome 
(SOLO) Taxonomy formulated by Biggs and Collis (1982, 1991) was used to 
identify cognitive levels in students' responses to questions of study design. The 
model has been used effectively by other researchers to help identify various levels 
of sophistication in statistical thinking. The statistical thinking framework for 
elementary school students of Jones et al. (2000) and the Middle School Students 
Statistical Thinking (M3ST) framework (Mooney, 2002) are both based upon the 
SOLO Taxonomy. In addition, Watson, Moritz, and colleagues have conducted 
several studies in which the cognitive theory of Biggs and Collis was used to 
describe the relative sophistication of students' responses to statistical thinking 
tasks (e.g., Watson, Collis, Callingham, & Moritz, 1995; Watson & Moritz, 1999a, 
1999b, 2000a, 2000b). 

Biggs (1999) described how the SOLO Taxonomy identifies a hierarchy of 
responses to academic tasks, including five levels: Prestructural, Unistructural, 
Multistructural, Relational, and Extended Abstract. Prestructural responses show 
little evidence of learning relevant to the task at hand. Unistructural responses 
focus upon one relevant aspect of the task while missing several others. 
Multistructural responses incorporate more than one relevant aspect, but there is 
no unifying theme given for the aspects. At the Relational Level, a unifying theme 
is apparent along with multiple relevant aspects. Responses at the Extended 
Abstract Level are "breakthrough" responses that are not just coherent applications 
of academic learning, but go beyond the task at hand to apply the coherent whole 
to new areas. The first four levels in the SOLO model served to help differentiate 
among levels of response in the present study. 

Some research has indicated that the Unistructural, Multistructural, and 
Relational Levels may occur in two (or more) cycles in empirical data (e.g., 
Campbell, Watson, & Collis, 1992; Pegg, 1992). As a consequence, the SOLO 
Taxonomy can be described in terms of multiple Unistructural-Multistructural- 
Relational (UMR) cycles. The middle three levels in SOLO comprise one UMR 
cycle, whereas the lowest level (Prestructural) relates to a less sophisticated 
preliminary UMR cycle, and the highest level (Extended Abstract) is likely to relate 
to the Unistructural Level in a more sophisticated UMR cycle. In practical terms 
there is a focus for each UMR cycle that builds up, using single, then multiple, and 
finally related elements. 
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Methodology 

A qualitative design was chosen for the present study in order to investigate 
intricate thinking processes (Bogdan & Biklen, 1992; Merriam, 1988; Strauss & 
Corbin, 1990). Within the qualitative design, task-based clinical interviews were 
used as a primary means of data collection. Goldin (2000) noted that task-based 
interviews allow researchers to "focus research attention more directly on the 
subjects' processes of addressing mathematical tasks, rather than just on the 
patterns of correct and incorrect answers in the results they produce" (p. 520). 

Participants 

Purposeful sampling (Patton, 1990) was used in the selection of study 
participants. The goal of purposeful sampling is to "select information-rich cases 
whose study will illuminate the questions under study" (Patton, 1990, p. 169). The 
central questions of this study concerned the identification and description of 
different levels of statistical thinking. Therefore, a maximum variation sampling 
strategy (Patton, 1990) was used to select the purposeful sample. In this strategy a 
sample includes people who have had significantly different experiences in some 
area, and it allows the researcher to describe patterns of variation within the group 
studied. In this study, students were chosen on the basis of the different types of 
mathematics courses they had taken while in high school. The goal was then to 
describe the variation within the responses to interview tasks given by these 
students. Maximum variation samples are not drawn in order to make statistical 
generalizations to a larger population, but rather to describe variation and 
significant patterns within the group (Patton, 1990). Accordingly, this study does 
not seek to generalise findings to all high school students, but instead to describe 
the patterns in thinking observed among the diverse sample chosen. 

There were three different categories of participants interviewed for the study. 
The participants in the first category were college freshmen who had recently 
completed a semester-long high school statistics course. Participants in the second 
category were college freshmen who had recently completed a year-long high 
school statistics course. The participants in the third category were still in high 
school at the time of the clinical interviews. Each of the participants in the third 
category was enrolled in Algebra, Geometry, Advanced Placement (AP) Calculus, 
or AP Statistics at the time of the interview sessions. In total, fifteen students 
participated in clinical interviews, three from the first category, four from the 
second, and eight from the third. Students were recruited from each of the three 
categories by contacting university instructors and high school teachers, who 
mentioned the study in their classes and asked for volunteers. Table 1 summarises 
the academic background information about the students interviewed. 

Table 1 

Background and Current Enrolment of Students Participating in the Study 

Student Class at time of study Length of statistics Course enrolled in 

course completed at at time of study 
high school 

Lisa College freshman One semester 

Kristen College freshman One semester 

Laura College freshman One semester 



256 


Grotli 


Jeff 

College freshman 

One year 


Hillary 

College freshman 

One year 


Paul 

College freshman 

One year 


Julie 

College freshman 

One year 


Crystal 

High school senior 


AP Statistics 

Bill 

High school senior 


AP Calculus 

Luke 

High school senior 


AP Calculus 

Jessica 

High school sophomore 


Geometry 

Nancy 

High school sophomore 


Geometry 

Brooke 

High school sophomore 


Algebra 

Rick 

High school sophomore 


Algebra 

Daniel 

High school freshman 


Honours 

Geometry 


Note. In general, age breakdowns for high school classes in the United States are as follows: freshman = 
14-15 yrs., sophomore = 15-16 yrs., junior = 16-17 yrs., senior = 17-18 yrs. 


Although college freshmen were included in the study, their patterns of 
thinking were assumed to be similar to patterns one would expect to find among 
high school students. The participating college freshmen had all graduated from 
high school within the past six months, their interviews were all conducted during 
the first three months of the academic year, and none of them were enrolled in 
college courses that progressed significantly beyond the statistical content 
encountered in the AP Statistics course (College Entrance Examination Board, 
2001) that is becoming common in United States high schools. The college 
freshmen were included in the study in order to obtain the perspectives of students 
who had completed a statistics course while in high school. 

Interview Protocol 

The interview tasks that are reported upon in this paper are shown in Figures 1 
and 2. The tasks were part of a larger interview script (Groth, 2003). The specific 
content for each of the tasks came from current curricular recommendations for 
high school statistics courses (Cobb & Moore, 1997; College Entrance Examination 
Board, 2001; NCTM, 1989, 2000). Important statistical thinking skills for high 
school students include being able to "understand the differences among various 
kinds of studies and which types of inferences can legitimately be drawn from 
each" and to "know the characteristics of well-designed studies, including the role 
of randomization in surveys" (NCTM, 2000, p. 324). In Tasks la, lb, lc, and le 
(Figure 1), students were asked to design studies to answer questions of interest 
about people who live in the state of Florida. No study design was imposed upon 
the students, and they were free to approach the questions in any manner they 
deemed reasonable. In Task 2 (Figure 2), students were to expand a survey that 
had taken place in one state in order to include the entire United States. 
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Suppose that the governor of Florida puts you in charge of finding answers to the 
following questions: 

(a) What is the typical income of adults in the state? 

(b) Will I be re-elected in the election this fall? 

(c) What percentage of the state is computer-literate? 

(d) Does the new drug for treating the West Nile virus actually work? 

(e) How successful was the law which raised the minimum driving age from 16 
to 18? 

Describe a plan for gathering the information you would need in order to answer 
each of the questions, and how you would carry out each plan and report the 
results to the governor. 


Figure 1. Interview Task 1. 

Item 1(d) relates to the design of an experimental study. 


Suppose that in 1999, a newspaper reporter took a random sample of 15 
department stores from the state of Illinois. For each department store he sampled, 
he found out how much the highest paid man and the highest paid woman in the 
department store were paid per hour. The reporter wants to repeat this survey 
again in 2005 and expand it to the population of the entire United States. Describe a 
plan he could use for getting the information needed for the survey. 


Figure 2. Interview Task 2. 

The NCTM (2000) has recommended that high school students understand 
experimental studies and be able to conduct them in order to answer quantifiable 
questions of interest. Moore (1997) distinguished between an observational study 
and an experiment, saying "an observational study observes individuals and 
measures variables of interest but does not attempt to influence the responses", 
whereas "an experiment, on the other hand, deliberately imposes some treatment 
on individuals in order to observe their responses" (p. 129). Moore (1997) went on 
to say that 

An observational study, even one based on a statistical sample, is a poor way to 
gauge the effect of an intervention. To see how nature responds to a change, we 
must actually impose the change. When our goal is understanding cause and 
effect, experiments are the only source of fully convincing data. (p. 129) 

Researchers have begun to investigate how to design instruction in order to 
help students learn the principles of experimental design (e.g., Derry, et al., 2000). 
In order to supplement the current research efforts underway in this area, a task 
was included on the interview protocol (Task Id) that elicited students' thinking 
about experimental design. The remainder of the tasks elicited thinking about non- 
experimental study design. 
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Procedure 

Each of the interview participants was informed that it would take a total of 
approximately 2-3 hours to answer all of the interview questions. Six of the college 
freshmen decided to split the total interview time between two interview sessions. 
Julie was the only student of the college freshmen who decided to do the entire 
interview in one session. Most of the participants still at high school were 
interviewed during study hall periods, so the total interview time for these 
students was generally split among three different 50-minute study hall periods. 
Only one of the students enrolled at high school, Daniel, decided to do the entire 
interview in one sitting by scheduling an interview session after school hours. 

During the clinical interview sessions, the interview protocol (Groth, 2003) was 
administered in the same order to each of the students. Task 1 (Figure 1) was posed 
at the beginning of each interview session. Task 2 (Figure 2) was asked 
approximately midway through each interview. As students were interviewed, 
data were collected by taking field notes, audio recording responses, and keeping 
any written work they completed. The audio recordings were later transcribed for 
analysis. 

Data Analysis 

Analysis of interview transcripts was informed by the constant comparative 
method described by Maykut and Morehouse (1994). One of the defining features 
of this method is that data analysis takes place concurrently with data collection. 
As the data were collected, the responses were examined and grouped according 
to their relative sophistication. Sophistication was judged in terms of the statistical 
appropriateness of the response, the number of relevant aspects incorporated in 
the response, and how well connections were made among the relevant aspects 
incorporated. After responses had been sorted into groups containing similar 
patterns of thinking, descriptors were written to capture the essence of each 
different pattern identified. The end result of the constant comparative method of 
data analysis was the generation of sets of descriptors for different patterns of 
thinking regarding both experimental (Task Id) and non-experimental (Tasks la, b, 
c, e, and Task 2) study design tasks. 

Some of the aspects of a check-coding procedure described by Miles and 
Huberman (1994, p. 64) were used to finish the process of data analysis. First, a 
random sample of six students was drawn. Two researchers familiar with the Biggs 
and Collis cognitive framework who had not yet seen the data analysed responses 
given by each of these six students. The two researchers categorised the student 
responses according to the descriptors the author formulated during the 
previously described data analysis process. The two researchers then met with the 
author to discuss the conclusions they had reached during data analysis. In cases 
where disagreement occurred about categorizations of students by descriptors or 
the evidence supporting the formulation of sets of descriptors, the disagreements 
were discussed until consensus was reached. In cases where disagreements 
occurred the originally written descriptors were revised or student responses were 
re-categorised. Some of the descriptors and categorisations were later further 
refined on the basis of reviewers' comments made on an earlier draft of this paper. 
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Results 

The discussion of results is divided into two main parts. The first part 
describes patterns of thinking observed in response to the tasks lending themselves 
to non-experimental study design (Tasks la, b, c, e, and Task 2). The second part 
describes patterns of thinking observed in response to the task lending itself to an 
experimental study design (Task Id). A final section presents the association of the 
levels. 

Designing a N on-Experimental Study 

Students' responses to the non-experimental tasks led to the formulation of the 
descriptors for patterns of thinking summarised in Table 2. The patterns show 
increasing appreciation of the design of a non-experimental study, principally in 
terms of recognising and combining elements for a successful design, the focus of 
the UMR cycle. At the Prestructural Level responses did not appreciate the need 
for a design strategy but the awareness of the existence of other information of 
relevance was likely to have been associated with a previous UMR cycle of 
learning. Responses recognising only the need for data collection were classified as 
Unistructural in terms of the focus on design. Responses that discussed the 
desirability of representative samples were considered more sophisticated than 
those that did not (Watson & Moritz, 2000a; Zawojewski & Shaugnessy, 2000) and 
the combination of this with the appreciation of the need for data collection 
constituted a Multistructural response in the cycle. In addition, responses that 
incorporated methods to ensure representative samples were considered more 
sophisticated than those that did not and these were classified as Relational. 


Table 2 

Levels of Thinking Associated with Designing a N on-Experimental Study 


Pattern descriptor 

Students whose 
responses reflected the 
pattern 

SOLO level 
associated with the 
pattern 

No design strategies articulated, 
but is aware of the existence of 
studies and empirical data. 

Nancy 

Prestructural 

Data collection without concern 
for representativeness. 

Brooke, Julie, Jessica, 
Laura 

Unistructural 

Data collection with concern for 
representativeness. 

Jeff, Lisa 

Multistructural 

Data collection with concern for 
representativeness and one or 
more methods to ensure it. 

Bill, Crystal, Daniel, 
Hillary, Kristen, Luke, 
Paul, Rick 

Relational 


In Table 2 and the following discussion, students' names are categorised 
according to the highest level they attained in answering the interview questions 
pertaining to non-experimental study design, so that, for example, a student who 
answered Task la at a Prestructural Level and then answered Task lb at a 
Multistructural Level would have his/her name associated with the 
Multistructural Level. 
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Prestructural Level. The least sophisticated pattern evident in response to 
designing a non-experimental study was that exhibited by Nancy. She relied 
primarily on pre-existing studies in order to gather the information needed to 
answer each question posed. She briefly mentioned interviewing people to 
determine whether or not the governor would be re-elected in the next election 
(Task lb), but the idea was not developed. She said she would make use of pre- 
existing information and studies in books, periodicals, and the internet in order to 
answer each of the other questions in the tasks. For example, when asked to 
determine the success of a law that raised the minimum driving age from 16 to 18, 
she stated. 

Well, it seems like I'm relying on the internet a lot, but that's basically how I 
would. 1 guess you'd have to look up the accident claims from insurance 
companies and see if the claims were higher or lower or whatever after the law 
was passed. 

Although she recognised that empirical data would be useful for answering the 
questions of interest, she relied on others to gather the data for her rather than 
developing her own data gathering techniques. 

Nancy's response to Task 2 also lacked reference to any original empirical data 
gathering techniques. When asked how she would extend the survey described to 
the entire United States, she stated, "Well, what I would do to make it easier, I 
would do a study like this [study described in the problem], but I would do one for 
every state, and then I would probably average them out from there." Her study 
design incorporated no new aspects beyond those already described in the context 
of the problem. Further, there was no evidence that she understood the purpose of 
the data gathering techniques described in the problem context. 

Nancy's responses to the study design tasks reflected the Prestructural Level in 
the SOLO Taxonomy. Biggs (1999) noted that Prestructural responses show little 
evidence of learning relevant to the task at hand, and are often tautological in 
nature. Nancy's responses to each part of Task 1 gave little evidence that she had 
developed understanding of how empirical data gathering techniques are relevant 
to the task of study design. Further, her response given to Task 2 was tautological, 
in that it was essentially a restatement of the problem with no original ideas for 
study design provided. Hence, Nancy's responses to the tasks involving non- 
experimental study design were categorized as Prestructural. 

Unistructural Level. In the next pattern identified, students at some point 
recognised the need for empirical data and also began to develop their own ideas 
for gathering the data. They discussed their data gathering techniques, however, 
without expressing concern that the data they would gather would be 
representative of the population from which they were drawn. Brooke, for 
example, proposed the following plan for predicting whether or not the governor 
would be re-elected in the fall election in (Task lb): 

In response to the election question, I think I would do some kind of poll that 
people could answer on the internet or, like, by responding to a phone number in 
the newspaper. And you also would send door to door scouts out, that ask people 
if they vote, are they planning on voting for the governor. Then I would probably 
just display that by a percentage of people that we polled. 

Although she discussed a number of data gathering techniques, none of them 
ensured that the sample drawn would be representative of the overall population. 
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In fact, the data gathering techniques proposed by students whose responses 
reflected this pattern were quite likely to produce non-representative samples. 

Whitney's approach to determining the typical income of people in the state of 
Florida (Task la) further exemplified the use of data gathering techniques without 
concern for obtaining representative samples. In response to Task la, she stated, 

I would probably conduct some kind of a survey or a poll just to see, you know, 
what people say. Or, I don't know, maybe talk to businesses, too. Um, see what 
they pay their employees, but that would probably be harder. I probably wouldn't 
talk so much to people as get it written, because then you would have the 
information to work with if you needed to look back on it, things like that. 

Whitney's response to Task la was similar to Brooke's response to Task lb in that 
methods of empirical data collection were proposed, but no mention was made of 
gathering a sample that was representative of the population of interest. 

Responses that incorporated data gathering methods without concern for 
drawing a representative sample from the population were considered to be 
Unistructural. The responses were at least "on track" in the sense that empirical 
data gathering techniques were proposed at some point. The empirical data 
gathering techniques, however, were the only aspect of relevance included in 
responses. The important relevant aspect of representativeness was missing. 

Multistructural Level. Concern for representativeness appeared in responses 
categorized at the Multistructural Level. At this level, students proposed data 
gathering techniques and recognised the importance of obtaining representative 
samples in sampling situations. Jeff, for example, recognised that it would be 
important to obtain data for the governor's re-election (Task lb) from "a wide 
range of the population from various backgrounds." In response to the same 
question, Lisa felt it was important to "make sure that there's all types of people, 
and that there is no economic bias" within a surveyed sample. She did not, 
however, offer a strategy for obtaining such a representative sample. In general, 
the students responding with this pattern not only proposed data gathering 
techniques, but also at some point articulated that it was important for the data 
gathered to be representative of the larger population. The study designs proposed 
in the highest level responses given by both Jeff and Lisa incorporated two aspects 
relevant to study design: (a) use of data gathering techniques; and (b) concern for 
representativeness. Since the two relevant aspects were incorporated, but not 
unified by a coherent strategy for obtaining representativeness, the responses were 
considered Multistructural. The unification of the two relevant aspects came about 
in responses categorized at the next level. 

Relational Level. Students giving responses reflecting a more sophisticated 
pattern of thinking proposed methods to ensure that the samples drawn when 
gathering empirical data would be representative of the population of interest. Bill, 
for example, suggested the novel strategy of using stratified sampling when asked 
to expand the department store survey to the entire United States (Task 2), saying. 

You could take a direct approach, call each department store, that's just crazy. Or 
like the census bureau does, you don't have to go to every single house. You could 
go to, I mean, you could research department store wages in a couple of states, 
north, east, south, west, all around, and then get an average of that. 

Daniel suggested a random sampling strategy in response to the same question, 
saying, "He could pick some cities at random, and pick one man and one woman 
from a department store from that city chosen at random." 
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Some students incorporated both random and stratified sampling in the design 
of studies. Paul's response to the question of determining the typical income of 
adults in the state of Florida (Task la) illustrated this in the following way. 

OK, um, there's a couple of ways, I guess, you could do this. One would be to do a 
census, you know, and take information from everyone. You'd have to find ways 
to get to everyone, so they'd have to mail it back, and make sure people - send 
people out to go find them - that's one way. Another way would be to do a simple 
random sample, I might stratify the districts, or the counties, I guess. So, you'd 
have the same amount if there's urban, rural, suburban areas. Then take the same 
amount - or even it out that way. Then I guess you'd just find the mean salary for 
each of the adult households. 

Responses at this level went beyond the Multistructural Level, not only mentioning 
the two relevant aspects of data gathering and representativeness, but also 
incorporating data gathering methods for producing representative samples. 

Designing an Experimental Study 

In the task to explore understanding of designing an experimental study 
(Task Id), students were asked to evaluate the effectiveness of a hypothetical drug 
that had just been developed. Four different patterns were identified in the 
responses to the task. They are summarised in Table 3 and have structural 
similarities to the levels of response reported for non-experimental studies in Table 
2 . 


Table 3 

Levels of Thinking Associated with Designing an Experimental Study 


Pattern descriptor 

Students whose 
responses reflected the 
pattern 

SOLO level 
associated with 
the pattern 

Relies solely upon pre-existing 
studies and artefacts. 

Nancy 

Prestructural 

Data collection without experimental 
method. 

Brooke, Hillary, Jeff, 
Jessica, Laura, Rick 

Unistructural 

Data collection with experimental 
method. 

Bill, Daniel, Kristen, 
Luke 

Multistructural 

Data collection with experimental 
method and one or more controls 
to ensure integrity of the 
experiment. 

Crystal, Julie, Lisa, 
Paul 

Relational 


At the Prestructural Level, responses rely on pre-existing information likely to 
be related to an earlier learning cycle. The next three levels again show increasing 
inclusion of elements that are associated with appropriate experimental design: 
data collection, data collection and experimental method, and data collection and 
experimental method related by controls for integrity. 

Prestructural Level. Nancy's response to the task illustrated the least 
sophisticated pattern of response evident. When asked how she would evaluate 
the effectiveness of the new drug that had been developed, she stated. 
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Well, 1 guess you'd have to go to like a direct source from ... I mean if you couldn't 
get the answers in a book or a periodical or online or something ... If you could 
actually find someone who had actually encountered the virus and was working 
on a drug for it, if you had that kind of access. 

She relied solely upon pre-existing studies done by "experts" in order to answer 
the given question. Since Nancy's response gives no hint that she would design her 
own study to collect data, it was considered Prestructural. Her response indicated 
that she would rely solely on studies done by others. No mention was made of 
what methods should be employed for a study to be deemed trustworthy. 

Unistructural Level. In responses reflecting the next pattern, students proposed 
their own methods for gathering data in order to answer the question. The study 
designs, however, were observational rather than experimental in nature. Jessica, 
for example, proposed gathering information about success and failure from 
people who had used the drug, stating, 

1 would speak to doctors, see what their opinion is. But, probably more 
importantly, talk to people who have contracted the disease and have taken the 
drug to see how they felt, have their symptoms got better, and how it worked out 
for them. 

Jessica's response was initially very similar to Nancy's, in that both initially 
proposed going to an "expert" source in order to determine the efficacy of the 
drug. Jessica's response, however, was more complex in that she also proposed a 
data gathering technique that could be used in an original study design. She 
decided that talking to people who had contracted the disease might yield 
information pertinent to answering the question at hand. 

Brooke's response further illustrated the pattern of thinking in which data 
gathering methods were mentioned. Like Jessica, Brooke suggested gathering 
observational data in order to answer the question, stating. 

For the drug one, I think that would be pretty easy, because you would just have to 
go to the hospital and see how many had been given the vaccine. And then (to see) 
if it works . . . you would just have to look at the comments on the chart and see if 
they had to come back or not. Then you could kind of just make up a little average, 

1 guess, of how many times it did work, and if it was good enough. You would 
probably have to have a pretty high percentage for it to actually work for most 
people. So, maybe you would have to compare ages, maybe for like certain ages it 
works better, or what not. 

Responses fitting the Unistructural Level incorporated data gathering 
techniques of an observational nature rather than selecting one group to receive the 
medicine and another not to receive it. The only aspect of design relevant to 
experimental design incorporated in the responses was the gathering of empirical 
data. Since this was the only relevant aspect present in the responses, the responses 
fitting the pattern, illustrated above by Jessica and Brooke, were considered 
Unistructural. 

Multistructural Level. The incorporation of a second relevant aspect of 
experimental design occurred in Multistructural responses. Bill suggested testing 
the drug on animals. Daniel suggested finding some people who had the virus and 
testing the drug on them. Kristen and Luke both suggested analysing the results of 
clinical tests without naming specific experimental subjects. Each response 
reflecting this pattern indicated the recognition of the possibility of imposing a 
treatment upon a group of subjects in order to answer the question of interest. 
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Bill's suggested strategy helps illustrate the Multistructural Level. He stated. 

You could probably go to your science department on that one. Have them 
probably first test the animals. And, I mean further research would see if it worked 
or not. You'd have to test even before you brought it out to the public, for one. 1 
mean, once you figured out if it worked or not, you would let the public know. 

In Multistructural responses such as Bill's, the relevant aspect of data collection 
was incorporated along with the relevant aspect of the imposition of a treatment 
upon individual subjects. The responses, however, simply incorporated the two 
aspects and did not unify them by suggesting methods by which the integrity of 
the data obtained by experimental methods could be ensured. 

Relational Level. Methods for maintaining the integrity of experimental data 
were mentioned in responses reflecting the Relational Level. In these responses, 
students not only recognised the possibility of conducting an experiment to answer 
the question of interest, but also proposed controls to be put on the experiments in 
order to help ensure that the experiment would produce trustworthy results. 
Crystal proposed. 

For this one, I would take a group of people who actually have the West Nile virus, 
and I would make sure that they are all in the same stage, so that some aren't 
worse off than others. Then I would give part of the people a fake-type drug, and 
one the real drug, and see how the differences in their improvement turn out. 

Julie's strategy for testing the drug further illustrates the Relational Level 
pattern. She stated, 

OK, well you would have to do some sort of experiment where you have the actual 
drug, and then, like a placebo. Two groups are chosen at random and put into 
random groups. Um, and it would help if it was double blind. And, just have one 
group taking the new drug and one group taking the placebo. 

The pattern illustrated by the responses of Crystal and Julie was considered 
Relational in nature because the two relevant aspects of data collection and 
imposition of treatment were not just included, but also tied together by methods 
aimed at producing trustworthy experimental data. 

Association of Levels for the Two Types of Task 

Table 4 shows the manner in which the levels of response exhibited by 
individual students for the two types of task relate to each other. Nancy, Brooke, 
Jessica, Laura, Crystal, and Paul exhibited the same levels of response for both 
experimental and non-experimental study design tasks. Lisa, Jeff, Bill, Daniel, 
Kristen, and Luke gave responses for the two types of task that fell within one 
developmental level of each other. For the majority of students interviewed, 
therefore, the level of response for the experimental task was an exact match or else 
very close to the level of response for the non-experimental tasks. 

Table 4 does, however, show that three students gave responses for the two 
types of task that were not within one cognitive level of each other. Julie exhibited 
Unistructural thinking for non-experimental design and Relational thinking for 
experimental design. Hillary and Rick exhibited Relational thinking for non- 
experimental design and Unistructural thinking for experimental design. This 
illustrates that levels of thinking exhibited in regard to the two types of task can 
differ substantially. 
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The fact that seven students performed at higher levels on the non- 
experimental tasks than the experimental task, whereas only two performed at a 
higher level on the experimental task may deserve further attention. The 
differences may reflect (i) a lack of exposure to experimental settings in the 
students' previous learning, (ii) the inherent greater difficulty of designing 
experimental studies, or (iii) the fact that there were fewer experimental tasks on 
which students could show their competence. 


Table 4 

Association of Levels for the Two Types of Task 



SOLO Level of Response to Experimental Design Task 

SOLO Level of 

Prestructural 

Unistructural 

Multistructural 

Relational 

Response to Non- 
Experimental 
Design Tasks 





Prestructural 

Unistructural 

Nancy 

Brooke, 
Jessica, Laura 


Julie 

Multistructural 


Jeff 


Lisa 

Relational 


Hillary, Rick 

Bill, Daniel, 
Kristen, Luke 

Crystal, 

Paul 


Another surprising relationship revealed in Table 4 is the fact that the college 
students who had taken a statistics course while in high school did not necessarily 
outperform the other students. Luke, for example, performed almost as well as 
Crystal and Paul on each type of task. Jeff, who had taken a year-long statistics 
course in high school, exhibited levels of thinking below those of Bill, Daniel, and 
Luke, who had not taken a statistics course. This suggests that students who have 
taken a high school statistics course do not necessarily exhibit thinking about study 
design issues at a higher level than those who have not taken such a course. 

Discussion 

The present study seems to raise at least as many questions as it answers. 
Although it provides a picture of levels of thinking one might expect from students 
during and immediately after the high school years, much follow up research 
remains to be done. It is hoped that the present study will serve as a catalyst for 
this follow up research. In this section the limitations of the present study are 
discussed and suggestions made for directions for further research. 

Limitations 

The picture of students' thinking in regard to experimental design is somewhat 
limited because students answered only one item related to designing an 
experiment. It is not known how changing the context of the item would have 
influenced the levels of response exhibited by the students. It possible that an item 
set in a different context would have prompted students to respond at either 
higher or lower levels. 
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The present study provides snapshots of students' thinking for the purpose of 
informing a proposed hierarchy of levels of response high school students could be 
expected to give to study design tasks. The amount of information gathered per 
student is not sufficient to pinpoint the exact developmental level of each student 
involved. The purpose of the study, however, was to give a broad picture of 
different levels of sophistication in response to study design tasks. 

It is likely that the levels of sophistication described in the present study could 
be refined by a similar study involving a larger sample of students. Although an 
effort was made in the present study to obtain data from students representing a 
range of mathematical backgrounds, it cannot claim to have documented all of the 
characteristics of possible patterns of response that one might encounter. 

Directions for Further Research 

Some directions for further research are implied by the limitations mentioned 
above. One direction for further research would be to describe how changing the 
context of an experimental design problem changes the level of response elicited 
from individual students. Another possibility would be to replicate the present 
study and include a larger sample of students from various backgrounds. 

Another direction for further research would be to continue to investigate the 
relationship between students' responses to experimental design tasks and their 
responses to non-experimental design tasks. The present study suggests that an 
individual student's level of thinking in regard to experimental study design may 
differ substantially from his or her level of thinking in regard to non-experimental 
study design. Further research might help to determine the reasons why some 
students seem to respond at quite different levels for the two types of tasks. 

One more direction for further studies would involve replicating the present 
study with a younger group of students. It seems that there may be a UMR cycle of 
lesser sophistication leading up to the one documented for the high school 
students. Nancy's responses, classified at the Prestructural Level, may also reflect 
the Relational Level of a UMR cycle in which students gradually become aware of 
the importance of research but do not develop their own strategies for statistical 
study design. Interviews with a large sample of younger students could help to 
determine the characteristics of the levels within the UMR cycle in which Nancy's 
responses seem to fit. 


Conclusion 

The current study provides a working framework for describing high school 
students' thinking in regard to statistical study design. This working framework 
has the potential to contribute to both teaching and research. It provides teachers 
of high school age students an indication of some of the levels of thinking they can 
expect to encounter in their classes. It provides researchers with starting points for 
further investigation of levels of thinking about statistical study design. By 
providing insight for teaching and further research, it is hoped that the study will 
ultimately be a contributing factor in enhancing student learning in the area of 
statistical study design. 
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